Abstract: Spoken content retrieval refers to directly indexing and retrieving spoken content based on the audio rather than text descriptions. This potentially eliminates the requirement of producing text descriptions for multimedia content for indexing and retrieval purposes, and is able to precisely locate the exact time the desired information appears in the multimedia. Spoken content retrieval has been very successfully achieved with the basic approach of cascading automatic speech recognition with text information retrieval: after the spoken content is transcribed into text or lattice format, a text retrieval engine searches over the ASR output to find desired information. Latent Dirichlet allocation algorithm is used for clustering. It is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. Depends on automatic speech recognition output the algorithm will perform clustering on that content and return the expected result. After converting audio to text we apply LDA algorithm for clustering the input data.

Keywords: Spoken content retrieval, spoken term detection, query by example, semantic retrieval, joint optimization, pseudo-relevance feedback, graph-based random walk, unsupervised acoustic pattern discovery, query expansion, interactive retrieval, summarization, key term extraction.